Inductive Logic From Data Analysis to Experimental Design
نویسنده
چکیده
In celebration of the work of Richard Threlkeld Cox, we explore inductive logic and its role in science touching on both experimental design and analysis of experimental results. In this exploration we demonstrate that the duality between the logic of assertions and the logic of questions has important consequences. We discuss the conjecture that the relevance or bearing, b, of a question on an issue can be expressed in terms of the probabilities, p, of the assertions that answer the question via the entropy. However, symmetry requires that probability of an assertion be expressible in terms of the relevance of the questions to which that assertion is an answer and that the form of the relation be the same. Thus if relevance is ∑ − p p log or entropy, then ∑ − b b log is a probability. In its application to the scientific method, the logic of questions, inductive inquiry, can be applied to design an experiment that most effectively addresses a scientific issue. This is performed by maximizing the relevance of the experimental question to the scientific issue to be resolved. It is shown that these results are related to the mutual information between the experiment and the scientific issue, and that experimental design is akin to designing a communication channel that most efficiently communicates information relevant to the scientific issue to the experimenter. Application of the logic of assertions, inductive inference (Bayesian inference) completes the experimental process by allowing the researcher to make inferences based on the information obtained from the experiment. THE LOGIC OF INFERENCE AND INQUIRY These workshops have spanned over two decades of research during which the power of Bayesian (or inductive) inference has been demonstrated time and time again. Slowly, but surely, these techniques have become more accepted in mainstream science with applications in virtually every field. Even as I write, the Office Assistant on this word processor, which uses a Bayesian network to infer my intentions from my actions is offering a suggestion to help me with the formatting of this document. It is performing the equivalent of data analysis, which is arriving at the most probable conclusions given one's prior knowledge and newly acquired data. While data analysis is an extremely important part of scientific investigation, its counterpart, experimental design is equally important. Intuitively, the problem of experimental design, which consists of choosing an experimental question most relevant to the scientific issue to be resolved, is related to data analysis. However, there does not yet exist a complete theory of the logic of inference and inquiry. The goal of this paper is to introduce the reader to the overarching framework of inductive logic, to describe what is known regarding the relationships between inference, inquiry, probability theory and information theory, and to highlight what is not known. Deductive and Inductive Inference As deductive inference refers to implication among logical assertions in situations of complete certainty, we begin with Boolean logic. An assertion a implies an assertion b , written b a→ , if a b a = ∧ and b b a = ∨ , where ∧ is the logical and operation such that b a ∧ is an assertion that tells what a and b tell jointly, and ∨ is the logical or operation such that b a ∨ is an assertion that tells what a and b tell in common. As an example consider the two assertions " " Kangaroo a is It a = and " " Animal an is It b = . The assertion a implies the assertion b as jointly the two assertions say "It is a Kangaroo !". In addition, the common assertion b a ∨ says "It is an Animal!". Table 1 (below) lists the Boolean identities for assertions. Richard T. Cox's major contribution [1,2] to inductive inference arises from generalizing Boolean implication to implication of varying degree, where the real number representing the degree to which the implicate b is implied by the implicant a is written as ) ( b a→ . The inferential utility of this formalism is readily apparent when the implicant is an assertion representing a premise and the implicate is an assertion representing a hypothesis. From the associativity of the conjunction of assertions, )) ( ( ) ) ( ( d c b a d c b a ∧ ∧ → = ∧ ∧ → , Cox derived a functional equation, which has as a particular solution ) ( ) ( ) ( c b a b a c b a → ∧ → = ∧ → . (1) In addition, if you know something about an assertion, you also know something about its contradictory. In other words, the degree to which a premise implies an assertion b determines the degree to which the premise implies its contradictory ~b. This logical principle can be applied twice to obtain a functional equation, which has as a particular solution 1 ) ~ ( ) ( = → + → b a b a . (2) In general the first functional equation puts some constraints on the second, which results in a general solution r r r c b a b a c b a ) ( ) ( ) ( → ∧ → = ∧ → (3) C b a b a r r = → + → ) ~ ( ) ( , (4) where r and C are arbitrary constants. Setting 1 = = C r one obtains the particular solutions above. Cox demonstrated that this measure of relative degree of implication among assertions is the unique logically consistent measure. We do well to define probability 1 Here we adopt the notation used by Cox where an assertion is denoted by a lowercase Roman character, and a question is denoted by an uppercase Roman character. In addition, we adopt the notation used by Fry where assertions are stated with exclamation marks and questions with question marks. as this relative degree of implication among assertions. In fact, a simple change of notation ) ( ) | ( b a a b p → ≡ reveals that the equations (1) and (2) above ) | ( ) | ( ) | ( b a c p a b p a c b p ∧ = ∧ (5) 1 ) | ~ ( ) | ( = + a b p a b p (6) are the product and sum rules, respectively, of probability theory. Utilizing the commutativity of the conjunction of two assertions b c c b ∧ ≡ ∧ , equation (5) can be applied to obtain ) | ( ) | ( ) | ( b a c p a b p a c b p ∧ = ∧ (7) ) | ( ) | ( ) | ( c a b p a c p a b c p ∧ = ∧ . (8) Equating the right-hand sides of (7) and (8), we obtains Bayes' Theorem ) | ( ) | ( ) | ( ) | ( a c p b a c p a b p c a b p ∧ = ∧ , (9) which allows one to evaluate the probability of a hypothesis given one's prior knowledge and newly acquired data. The foundation of data analysis rests on this theorem. Two important points should be noted. First, this formalism allows one to perform inductive inference over a broad range of applications. Given a set of assertions this calculus allows one to determine the relative degree to which any assertion implies any other. This is far beyond the scope supported by frequentist statistics. Second, there cannot be implication without an implicant. In short, probabilities are always conditional on some state of prior knowledge. Deductive and Inductive Inquiry While it is possible to examine the logical relationships among what is known, it is equally possible to examine the logic of what is unknown. Cox's second major contribution [3] was to lay the foundations for the logic of questions. He defined a question as the set of assertions that answer the question. For example, the question " " live kangaroo my does state what In K = can be expressed in terms of assertions by
منابع مشابه
A Logic-Based Approach to Mining Inductive Databases
In this paper, we discuss the main problems of inductive query languages and optimisation issues. We present a logic-based inductive query language and illustrate the use of aggregates and exploit a new join operator to model specific data mining tasks. We show how a fixpoint operator works for association rule mining and a clustering method. A preliminary experimental result shows that fixpoin...
متن کاملThe Expresso Microarray Experiment Management System: The Functional Genomics of Stress Responses in Loblolly Pine
Conception, design, and implementation of cDNA microarray experiments present a variety of bioinformatics challenges for biologists and computational scientists. The multiple stages of data acquisition and analysis have motivated the design of Expresso, a system for microarray experiment management. Salient aspects of Expresso include support for clone replication and randomized placement; auto...
متن کاملModel Driven Development Transformations using Inductive Logic Programming
Model transformation by example is a novel approach in model-driven software engineering. The rationale behind the approach is to derive transformation rules from an initial set of interrelated source and target models; e.g., requirements analysis and software design models. The derived rules describe different transformation steps in a purely declarative way. Inductive Logic Programming utiliz...
متن کاملLIME: A System for Learning Relations
This paper describes the design of the inductive logic programming system Lime. Instead of employing a greedy covering approach to constructing clauses, Lime employs a Bayesian heuristic to evaluate logic programs as hypotheses. The notion of a simple clause is introduced. These sets of literals may be viewed as subparts of clauses that are eeectively independent in terms of variables used. Ins...
متن کاملWeb Usage Mining with Inductive Logic Programming
This paper suggests an experimental approach of how to apply inductive logic programming in the discovery of web usage patterns in the form of first-order rules representing user sessions. Such rules may be used to improve the quality and the performance of a web site. The experiment has been done using the Progol Inductive Logic Programming System, and the data source are log files from an Apa...
متن کامل